Goto

Collaborating Authors

 Crowdsourcing


Erin Brockovich launches a crowdsourced AI data center map

Engadget

Most of the reports so far came from Texas. Erin Brockovich, the American environmental activist portrayed by Julia Roberts in the film named after her, has launched a new project that aims to give people a platform to speak up and voice concerns about AI data centers in their communities. The new Brockovich AI Data Center Reporting website centers on a map showing major operational AI data centers and facilities under construction in the US, along with projects reported by the community. Some of the reports could be for rumored or proposed projects, so not every dot on the map represents a data center that's already running. The website has received 2,716 reports so far, with the biggest chunk coming from Texas.


This viral Dutch Fish Doorbell is peak internet

PCWorld

When you purchase through links in our articles, we may earn a small commission. The Dutch Fish Doorbell mixes livestreams, crowdsourcing, and conservation in all of the best ways. Every spring in the Dutch city of Utrecht, thousands of fish attempt to migrate through the city's canals to reach spawning grounds, but locked flood gates stay shut for long stretches to manage water levels. So the city came up with a weirdly charming solution: a fish doorbell. The site, called Visdeurbel --or Fish Doorbell--lets anyone in the world help the fish out.


SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data

Neural Information Processing Systems

Biodiversity is declining at an unprecedented rate, impacting ecosystem services necessary to ensure food, water, and human health and well-being. Understanding the distribution of species and their habitats is crucial for conservation policy planning. However, traditional methods in ecology for species distribution models (SDMs) generally focus either on narrow sets of species or narrow geographical areas and there remain significant knowledge gaps about the distribution of species. A major reason for this is the limited availability of data traditionally used, due to the prohibitive amount of effort and expertise required for traditional field monitoring. The wide availability of remote sensing data and the growing adoption of citizen science tools to collect species observations data at low cost offer an opportunity for improving biodiversity monitoring and enabling the modelling of complex ecosystems. We introduce a novel task for mapping bird species to their habitats by predicting species encounter rates from satellite images, and present SatBird1, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird, considering summer (breeding) and winter seasons. We also provide a dataset in Kenya representing low-data regimes. We additionally provide environmental data and species range maps for each location.


Triple Eagle: Simple, Fast and Practical Budget-Feasible Mechanisms

Neural Information Processing Systems

We revisit the classical problem of designing Budget-Feasible Mechanisms (BFMs) for submodular valuation functions, which has been extensively studied since the seminal paper of Singer [FOCS'10] due to its wide applications in crowdsourcing and social marketing. We propose TripleEagle, a novel algorithmic framework for designing BFMs, based on which we present several simple yet effective BFMs that achieve better approximation ratios than the state-of-the-art work for both monotone and non-monotone submodular valuation functions. Moreover, our BFMs are the first in the literature to achieve linear complexities while ensuring obvious strategyproofness, making them more practical than the previous BFMs. We conduct extensive experiments to evaluate the empirical performance of our BFMs, and the experimental results strongly demonstrate the efficiency and effectiveness of our approach.


experiments

Neural Information Processing Systems

A.1 Experimental design Figure 1 summarizes the experimental design used for our experiments. The participants that went through our experiments are users from the online platform Amazon Mechanical Turk (AMT). Through this platform, users stay anonymous, hence, we do not collect any sensitive personal information about them. We prioritized users with a Master qualification (which is a qualification attributed by AMT to users who have proven to be of excellent quality) or normal users with high qualifications (number of HIT completed = 10000and HIT accepted > 98%). Before going through the experiment, participants are asked to read and agree to a consent form, which specifies: the objective and procedure of the experiment, as well as the time expected to completion ( 5 - 8 min) with the reward associated ($1.4), and finally, the risk, benefits, and confidentiality of taking part in this study.



Noisy Label Learning with Instance-Dependent Outliers: Identifiability via Crowd Wisdom

Neural Information Processing Systems

The generation of label noise is often modeled as a process involving a probability transition matrix (also interpreted as the) imposed onto the label distribution. Under this model, learning the ``ground-truth classifier''---i.e., the classifier that can be learned if no noise was present---and the confusion matrix boils down to a model identification problem. Prior works along this line demonstrated appealing empirical performance, yet identifiability of the model was mostly established by assuming an instance-invariant confusion matrix. Having an (occasionally) instance-dependent confusion matrix across data samples is apparently more realistic, but inevitably introduces outliers to the model. Our interest lies in confusion matrix-based noisy label learning with such outliers taken into consideration. We begin with pointing out that under the model of interest, using labels produced by only one annotator is fundamentally insufficient to detect the outliers or identify the ground-truth classifier. Then, we prove that by employing a crowdsourcing strategy involving multiple annotators, a carefully designed loss function can establish the desired model identifiability under reasonable conditions. Our development builds upon a link between the noisy label model and a column-corrupted matrix factorization mode---based on which we show that crowdsourced annotations distinguish nominal data and instance-dependent outliers using a low-dimensional subspace. Experiments show that our learning scheme substantially improves outlier detection and the classifier's testing accuracy.


Crowdsourced Clustering: Querying Edges vs Triangles

Neural Information Processing Systems

We consider the task of clustering items using answers from non-expert crowd workers. In such cases, the workers are often not able to label the items directly, however, it is reasonable to assume that they can compare items and judge whether they are similar or not. An important question is what queries to make, and we compare two types: random edge queries, where a pair of items is revealed, and random triangles, where a triple is. Since it is far too expensive to query all possible edges and/or triangles, we need to work with partial observations subject to a fixed query budget constraint. When a generative model for the data is available (and we consider a few of these) we determine the cost of a query by its entropy; when such models do not exist we use the average response time per query of the workers as a surrogate for the cost. In addition to theoretical justification, through several simulations and experiments on two real data sets on Amazon Mechanical Turk, we empirically demonstrate that, for a fixed budget, triangle queries uniformly outperform edge queries. Even though, in contrast to edge queries, triangle queries reveal dependent edges, they provide more reliable edges and, for a fixed budget, many more of them. We also provide a sufficient condition on the number of observations, edge densities inside and outside the clusters and the minimum cluster size required for the exact recovery of the true adjacency matrix via triangle queries using a convex optimization-based clustering algorithm.


Semi-crowdsourced Clustering with Deep Generative Models

Neural Information Processing Systems

We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework. Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods.


Google scraps AI search feature that crowdsourced amateur medical advice

The Guardian

Google had said'What People Suggest' feature aimed to provide users with information from people with similar lived experiences. Google had said'What People Suggest' feature aimed to provide users with information from people with similar lived experiences. Google has dropped a new artificial intelligence search feature that gave users crowdsourced health advice from amateurs around the world. The company had said its launch of "What People Suggest", which provided tips from strangers, showed "the potential of AI to transform health outcomes across the globe". But Google has since quietly removed the feature, according to three people familiar with the decision.